class: center, middle, inverse, title-slide .title[ # Workflows for Reproducible Research with R & Git ] .subtitle[ ## Dependency Management ] .author[ ### Johannes Breuer, Bernd Weiss, & Arnim Bleier ] .date[ ### 2023-11-17 ] --- layout: true --- ## Dependencies in `R` Most `R` packages depend on other `R` packages. All `R` packages depend on `R`. -- Both `R` and `R` packages have versions. Different versions of `R` packages may depend on different versions of `R` and different versions of other packages. --- ## Dependencies in `R` <img src="data:image/png;base64,#Dependency_Management_files/figure-html/dep-graph-1.png" width="70%" style="display: block; margin: auto;" /> --- ## Dependencies in `R` <img src="data:image/png;base64,#../img/cran_usethis.png" width="175%" style="display: block; margin: auto;" /> <small><small>Source: https://cran.r-project.org/web/packages/usethis/index.html</small></small> --- class: center, middle ["It's ~~turtles~~ software all the way down"](https://en.wikipedia.org/wiki/Turtles_all_the_way_down) 🐢 -- ... or is it??? --- ## Digging ⛏ into your tool stack 🛠 Your full `R` setup consists of: 1. Specific versions of `R` packages<sup>1</sup> 2. A specific version of `R`<sup>2</sup> 3. A specific version of your operating system 4. Specific hardware .small[ [1] You can have different libraries with different (versions of) `R` packages. [2] You can also have different versions of `R` installed on your machine. ] --- ## Further dependencies <img src="data:image/png;base64,#../img/toppling-tower.jpg" width="30%" style="display: block; margin: auto;" /> <small><small>*Note*: We could also add system libraries between `R` and the OS (which are especially relevant in the [Linux/Unix world](https://www.tutorialspoint.com/operating_system/os_linux.htm)).</small></small> --- ## The danger of dependencies <img src="data:image/png;base64,#https://imgs.xkcd.com/comics/dependency.png" width="50%" style="display: block; margin: auto;" /> <small><small>https://xkcd.com/2347/</small></small> --- ## What to "ship" 🚢? - `R` code (+ underlying data) - this should include information about the packages used - information about the version of `R` and the used packages -- - your whole computational environment (focus of the next session) - overall goal: preventing what is known as "code rot" & "works-on-my-machine errors" (WOMME) --- ## Dependency management solutions As with almost everything in the `R` ecosystem, there are multiple solutions for dependency management: - a manual approach - [~~`checkpoint`~~](https://github.com/RevolutionAnalytics/checkpoint)<sup>1</sup> - [**`groundhog`**](https://groundhogr.com/) - [`renv`](https://rstudio.github.io/renv/)<sup>2</sup> - [`rang`](https://github.com/gesistsa/rang)<sup>3</sup> .small[ [1] Not an option anymore as it relied on the *CRAN Time Machine snapshots* from the *Microsoft R Application Network* (MRAN) which was [retired in July 2022](https://techcommunity.microsoft.com/t5/azure-sql-blog/microsoft-r-application-network-retirement/ba-p/3707161). [2] `renv` is the successor of [`packrat`](https://github.com/rstudio/packrat) (which is not maintained anymore). [3] Developed by members of the team [Transparent Social Analytics](https://www.gesis.org/en/institute/staff/orga/tiles/5/74?cHash=8fbd330b798c8dd7cb84097ddfd82054) at GESIS. ] --- ## Manual approach to dependency management There is an easy-to-use manual solution for providing information about the packages and `R` version used in your project: .small[ ```r sessionInfo() ``` ``` ## R version 4.3.2 (2023-10-31 ucrt) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 10 x64 (build 19044) ## ## Matrix products: default ## ## ## locale: ## [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8 ## [4] LC_NUMERIC=C LC_TIME=German_Germany.utf8 ## ## time zone: Europe/Berlin ## tzcode source: internal ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] depgraph_0.1.0 emo_0.0.0.9000 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.0 purrr_1.0.2 ## [7] readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 tidyverse_2.0.0 knitr_1.44 sjPlot_2.8.15 ## [13] scales_1.2.1 ggplot2_3.4.3 correlation_0.8.4 dplyr_1.1.3 ## ## loaded via a namespace (and not attached): ## [1] RColorBrewer_1.1-3 rstudioapi_0.15.0 jsonlite_1.8.7 datawizard_0.9.0 magrittr_2.0.3 ## [6] TH.data_1.1-2 estimability_1.4.1 farver_2.1.1 nloptr_2.0.3 rmarkdown_2.25 ## [11] fs_1.6.3 vctrs_0.6.3 memoise_2.0.1 minqa_1.2.6 effectsize_0.8.6 ## [16] webshot_0.5.5 htmltools_0.5.6.1 usethis_2.2.2 broom_1.0.5 sjmisc_2.8.9 ## [21] sass_0.4.7 xaringan_0.28 bslib_0.5.1 fontawesome_0.5.2 htmlwidgets_1.6.2 ## [26] sandwich_3.0-2 emmeans_1.8.8 zoo_1.8-12 cachem_1.0.8 uuid_1.1-1 ## [31] igraph_1.5.1 ggnetwork_0.5.12 mime_0.12 lifecycle_1.0.3 woRkshoptools_0.1.0 ## [36] pkgconfig_2.0.3 sjlabelled_1.2.0 Matrix_1.6-1.1 R6_2.5.1 fastmap_1.1.1 ## [41] shiny_1.7.5 snakecase_0.11.1 digest_0.6.33 easypackages_0.1.0 colorspace_2.1-0 ## [46] ps_1.7.5 pkgload_1.3.3 labeling_0.4.3 timechange_0.2.0 fansi_1.0.5 ## [51] httr_1.4.7 compiler_4.3.2 remotes_2.4.2.1 bit64_4.0.5 withr_2.5.1 ## [56] backports_1.4.1 performance_0.10.5 highr_0.10 pkgbuild_1.4.2 MASS_7.3-60 ## [61] sjstats_0.18.2 sessioninfo_1.2.2 miniCRAN_0.2.16 tools_4.3.2 httpuv_1.6.11 ## [66] glue_1.6.2 callr_3.7.3 nlme_3.1-163 promises_1.2.1 grid_4.3.2 ## [71] generics_0.1.3 gtable_0.3.4 tzdb_0.4.0 hms_1.1.3 xml2_1.3.5 ## [76] utf8_1.2.3 ggrepel_0.9.3 pillar_1.9.0 vroom_1.6.4 later_1.3.1 ## [81] splines_4.3.2 lattice_0.21-9 survival_3.5-7 bit_4.0.5 tidyselect_1.2.0 ## [86] miniUI_0.1.1.1 svglite_2.1.2 xfun_0.40 devtools_2.4.5 stringi_1.7.12 ## [91] yaml_2.3.7 boot_1.3-28.1 xaringanExtra_0.7.0 kableExtra_1.3.4 evaluate_0.22 ## [96] codetools_0.2-19 cli_3.6.1 xtable_1.8-4 parameters_0.21.2 systemfonts_1.0.5 ## [101] munsell_0.5.0 processx_3.8.2 jquerylib_0.1.4 modelr_0.1.11 Rcpp_1.0.11 ## [106] ggeffects_1.3.1 coda_0.19-4 parallel_4.3.2 ellipsis_0.3.2 assertthat_0.2.1 ## [111] prettyunits_1.2.0 bayestestR_0.13.1 profvis_0.3.8 urlchecker_1.0.1 lme4_1.1-34 ## [116] viridisLite_0.4.2 mvtnorm_1.2-3 insight_0.19.5 crayon_1.5.2 rlang_1.1.1 ## [121] rvest_1.0.3 multcomp_1.4-25 ``` ] --- ## `groundhog` [`groundhog`](https://groundhogr.com/) is a lightweight package that allows you to increase the reproducibility of your `R` scripts. It does so by installing and loading "packages & their dependencies as available on chosen date on CRAN". --- ## Using `groundhog` All you need to do to use `groundhog` is specifying the packages you want to use in your script and a date. ```r install.packages("groundhog") library(groundhog) pkgs <- c("tidyverse", "janitor", "sjPlot") groundhog.library(pkgs, date = "2023-11-71") ``` --- ## How `groundhog` works From the [package website](https://groundhogr.com/back-end/): "groundhog relies on a database that contains virtually all package versions ever uploaded to CRAN, the date when they were published, and all dependencies." `groundhog` also used to rely on MRAN. Since that has been retired, however, it now uses its own package repository: [GRAN: Groundhog R Archive Neighbor](https://groundhogr.com/gran/). --- ## How `groundhog` works `groundhog` uses its own package library to install the packages you specified. From the documentation of the `restore.library()` function: "When groundhog installs a package, it installs it into groundhog's library." ```r library(groundhog) get.groundhog.folder() ``` ``` ## [1] "C:/Users/breuerjs/Documents/R_groundhog/groundhog_library/" ``` "Groundhog then immediately moves the installed package(s) (and their dependencies) to the default personal library." ```r .libPaths()[1] ``` ``` ## [1] "C:/Users/breuerjs/AppData/Local/R/win-library/4.3" ``` .small[ *Note*: You can find a lot more technical information on the [`groundhog` website](https://groundhogr.com/) and within the help files for the `groundhog` functions. ] --- ## Reversing changes made by `groundhog` You can reverse the changes made by `groundhog` to your default personal `R` package library: ```r restore.library() ``` --- ## Pros and cons of `groundhog` *Pros:* - can be easily used to make existing `R` scripts more reproducible - does not require a "project-based workflow"<sup>1</sup> - does not require any specific knowledge on the reproducer's side *Cons:* - limited to packages from CRAN - works with package snapshots from specific dates, not with specific package versions - in reality, our installed packages are very often not up-to-date - installs specific package versions, but not specific versions of `R` .small[ [1] While you as someone who highly values reproducibility, of course, do use a project-based workflow, the people who want to reproduce your analysis might not 😉 ] --- ## Choosing a date The recommendation by the `groundhog` package authors for choosing a date is that "a good default is the first day of the month when starting your project". Once you have added `groundhog.library()` to your script, re-run it to make sure it produces the expected results. You can also update all of the packages you use and then specify the current date as the date for `groundhog.library()`. --- ## `groundhog` and `R` versions The `groundhog` database also includes information on base `R` releases. As `groundhog` does not install specific versions of `R`, you should still specify the version of `R` you used in your script. ```r R.version.string ``` ``` ## [1] "R version 4.3.2 (2023-10-31 ucrt)" ``` You can find the release dates for all versions of R via the [CRAN archive page](https://cran.r-project.org/src/base/). --- ## Excursus: Updating `R` As you probably know, you can download the most recent version of `R` from the [CRAN website](https://cran.r-project.org/). You can download older versions of `R` via the CRAN archive packages for [*Windows*](https://cran.r-project.org/bin/windows/base/old/) and [*Mac OS X*](https://cran-archive.r-project.org/bin/macosx/).<sup>1</sup> .small[ [1] How you install and update `R` on Linux depends on your [distribution](https://cran.r-project.org/bin/linux/). ] --- ## Excursus: Updating `R` On Windows, you can also use the [`installr` package](https://talgalili.github.io/installr/) to update `R`.<sup>1</sup> ```r install.packages("installr") library(installr) updateR() ``` .small[ [1] The `installr` package also offers some interesting other functionalities, such as installing `Git` via the `install.git()` function. ] --- ## Excursus: Updating packages The easiest way of updating packages is simply using `install.packages()`. To update all packages or a specific set of packages, you can use the `update.packages()` function. ```r # update all installed packages update.packages() # update specific packages update.packages(oldPkgs = c("tidyverse", "janitor", "sjPlot")) ``` --- ## Excursus: Updating packages With the following code, you can detach and update all of the packages you have currently loaded in your `R` session (excluding core `R` packages): ```r loaded_pkgs <- search() loaded_pkgs <- loaded_pkgs[grep("^package:", loaded_pkgs)] exclude_pkgs <- c("package:base", "package:stats", "package:graphics", "package:grDevices", "package:utils", "package:datasets", "package:methods", "package:utils") loaded_pkgs <- loaded_pkgs[!loaded_pkgs %in% exclude_pkgs] for (pkg in loaded_pkgs) { detach(pkg, character.only = TRUE, unload = TRUE) } update_pkgs <- gsub("^package:", "", loaded_pkgs) install.packages(update_pkgs) ``` --- ## Excursus: Updating packages If you want to install a specific version of a package (not the most recent one), the easiest option is to use a function from the `remotes` package. ```r library(remotes) install_version("tidyverse", version = "1.3.0") ``` --- class: center, middle # [Exercise](https://jobreu.github.io/reproducible-research-gesis-2023/exercises/Exercise_Dependency_Management.html) time 🏋️♀️💪🏃🚴 ## [Solutions](https://jobreu.github.io/reproducible-research-gesis-2023/solutions/Exercise_Dependency_Management.html) --- class: center, middle # More comprehensive dependency management options in `R` --- ## `groundhog` vs. `renv` See: https://groundhogr.com/renv/